{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4",
      "authorship_tag": "ABX9TyNO3mkZ+HMQrvkMHRtFpKvj",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Vaibhavs10/insanely-fast-whisper/blob/main/insanely_fast_whisper_colab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# [Insanely Fast Whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)\n",
        "\n",
        "By VB (https://twitter.com/reach_vb)\n",
        "\n",
        "P.S. Make sure you're on a GPU run-time 🤗"
      ],
      "metadata": {
        "id": "q0MBgZKbhdII"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install -q pipx && apt install python3.10-venv"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "VF-qp-FWJmyD",
        "outputId": "10712868-be6e-4b82-b8c2-95e43c591173"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[?25l     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/57.8 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m57.8/57.8 kB\u001b[0m \u001b[31m2.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m41.7/41.7 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "Reading package lists... Done\n",
            "Building dependency tree... Done\n",
            "Reading state information... Done\n",
            "The following additional packages will be installed:\n",
            "  python3-pip-whl python3-setuptools-whl\n",
            "The following NEW packages will be installed:\n",
            "  python3-pip-whl python3-setuptools-whl python3.10-venv\n",
            "0 upgraded, 3 newly installed, 0 to remove and 9 not upgraded.\n",
            "Need to get 2,473 kB of archives.\n",
            "After this operation, 2,884 kB of additional disk space will be used.\n",
            "Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 python3-pip-whl all 22.0.2+dfsg-1ubuntu0.4 [1,680 kB]\n",
            "Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 python3-setuptools-whl all 59.6.0-1.2ubuntu0.22.04.1 [788 kB]\n",
            "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 python3.10-venv amd64 3.10.12-1~22.04.2 [5,724 B]\n",
            "Fetched 2,473 kB in 2s (1,635 kB/s)\n",
            "Selecting previously unselected package python3-pip-whl.\n",
            "(Reading database ... 120880 files and directories currently installed.)\n",
            "Preparing to unpack .../python3-pip-whl_22.0.2+dfsg-1ubuntu0.4_all.deb ...\n",
            "Unpacking python3-pip-whl (22.0.2+dfsg-1ubuntu0.4) ...\n",
            "Selecting previously unselected package python3-setuptools-whl.\n",
            "Preparing to unpack .../python3-setuptools-whl_59.6.0-1.2ubuntu0.22.04.1_all.deb ...\n",
            "Unpacking python3-setuptools-whl (59.6.0-1.2ubuntu0.22.04.1) ...\n",
            "Selecting previously unselected package python3.10-venv.\n",
            "Preparing to unpack .../python3.10-venv_3.10.12-1~22.04.2_amd64.deb ...\n",
            "Unpacking python3.10-venv (3.10.12-1~22.04.2) ...\n",
            "Setting up python3-setuptools-whl (59.6.0-1.2ubuntu0.22.04.1) ...\n",
            "Setting up python3-pip-whl (22.0.2+dfsg-1ubuntu0.4) ...\n",
            "Setting up python3.10-venv (3.10.12-1~22.04.2) ...\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pipx run insanely-fast-whisper --file-name https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/ted_60.wav"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "i_H9Dm89Jj0-",
        "outputId": "f737b9fd-d625-4ccd-d8a1-1895cdf1b22f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "config.json: 100% 1.25k/1.25k [00:00<00:00, 6.33MB/s]\n",
            "model.safetensors: 100% 3.09G/3.09G [00:12<00:00, 242MB/s]\n",
            "generation_config.json: 100% 3.87k/3.87k [00:00<00:00, 17.3MB/s]\n",
            "tokenizer_config.json: 100% 283k/283k [00:00<00:00, 2.15MB/s]\n",
            "vocab.json: 100% 1.04M/1.04M [00:00<00:00, 5.28MB/s]\n",
            "tokenizer.json: 100% 2.48M/2.48M [00:00<00:00, 9.49MB/s]\n",
            "merges.txt: 100% 494k/494k [00:00<00:00, 3.74MB/s]\n",
            "normalizer.json: 100% 52.7k/52.7k [00:00<00:00, 97.3MB/s]\n",
            "added_tokens.json: 100% 34.6k/34.6k [00:00<00:00, 110MB/s]\n",
            "special_tokens_map.json: 100% 2.07k/2.07k [00:00<00:00, 8.95MB/s]\n",
            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
            "preprocessor_config.json: 100% 340/340 [00:00<00:00, 1.98MB/s]\n",
            "The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.\n",
            "\u001b[2K🤗 \u001b[33mTranscribing...\u001b[0m \u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[93m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m\u001b[37m━\u001b[0m \u001b[33m0:00:09\u001b[0m\n",
            "\u001b[?25hVoila! Your file has been transcribed go check it out over here! output.json\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!head output.json"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NDFrydpsvu57",
        "outputId": "de3d9635-5cf1-46ca-d401-e6c78c5659dc"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{\"text\": \" So in college, I was a government major, which means I had to write a lot of papers. Now, when a normal student writes a paper, they might spread the work out a little like this. So, you know, you get started maybe a little slowly, but you get enough done in the first week that with some heavier days later on, everything gets done and things stay civil. And I would want to do that, like that. That would be the plan. I would have it all ready to go, but then actually the paper would come along, and then I would kind of do this. And that would happen to every single paper. But then came my 90-page senior thesis, a paper you're supposed to spend a year on. I knew for a paper like that, my normal workflow was not an option. It was way too big a project. So I planned things out, and I decided it kind of had to go something like this. This is how the year would go. So I'd start off light,\", \"chunks\": [{\"timestamp\": [0.0, 4.48], \"text\": \" So in college, I was a government major,\"}, {\"timestamp\": [4.88, 6.62], \"text\": \" which means I had to write a lot of papers.\"}, {\"timestamp\": [7.42, 8.86], \"text\": \" Now, when a normal student writes a paper,\"}, {\"timestamp\": [8.94, 10.6], \"text\": \" they might spread the work out a little like this.\"}, {\"timestamp\": [11.74, 16.3], \"text\": \" So, you know, you get started maybe a little slowly,\"}, {\"timestamp\": [16.36, 17.86], \"text\": \" but you get enough done in the first week\"}, {\"timestamp\": [17.86, 19.76], \"text\": \" that with some heavier days later on,\"}, {\"timestamp\": [20.28, 21.98], \"text\": \" everything gets done and things stay civil.\"}, {\"timestamp\": [23.64, 25.8], \"text\": \" And I would want to do that, like that.\"}, {\"timestamp\": [26.12, 26.94], \"text\": \" That would be the plan.\"}, {\"timestamp\": [27.22, 29.84], \"text\": \" I would have it all ready to go,\"}, {\"timestamp\": [29.96, 32.42], \"text\": \" but then actually the paper would come along,\"}, {\"timestamp\": [32.46, 33.6], \"text\": \" and then I would kind of do this.\"}, {\"timestamp\": [36.48, 38.44], \"text\": \" And that would happen to every single paper.\"}, {\"timestamp\": [39.32, 43.04], \"text\": \" But then came my 90-page senior thesis,\"}, {\"timestamp\": [43.54, 46.0], \"text\": \" a paper you're supposed to spend a year on.\"}, {\"timestamp\": [46.0, 50.0], \"text\": \" I knew for a paper like that, my normal workflow was not an option.\"}, {\"timestamp\": [50.0, 52.0], \"text\": \" It was way too big a project.\"}, {\"timestamp\": [52.0, 56.0], \"text\": \" So I planned things out, and I decided it kind of had to go something like this.\"}, {\"timestamp\": [56.0, 58.0], \"text\": \" This is how the year would go.\"}, {\"timestamp\": [58.0, 60.0], \"text\": \" So I'd start off light,\"}]}"
          ]
        }
      ]
    }
  ]
}